CORRECTING CAMERA SHAKE BY INCREMENTAL SPARSE APPROXIMATION 



Paul Shearer, Anna C. Gilbert, Alfred O. Hero III 
University of Michigan, Ann Arbor 



CO 

O 

(N 

tin 



> 
U 

o 



> 
CO 

o 

O 

cn 



ABSTRACT 

The problem of deblurring an image when the blur kernel 
is unknown remains challenging after decades of work. Re- 
cently there has been rapid progress on correcting irregular 
blur patterns caused by camera shake, but there is still much 
room for improvement. We propose a new blind deconvo- 
lution method using incremental sparse edge approximation 
to recover images blurred by camera shake. We estimate the 
blur kernel first from only the strongest edges in the image, 
then gradually refine this estimate by allowing for weaker and 
weaker edges. Our method competes with the benchmark de- 
blurring performance of the state-of-the-art while being sig- 
nificantly faster and easier to generalize. 

1. INTRODUCTION 

In the problem of blind image deconvolution, we are given 
a blurry image y and challenged to determine an estimate x 
of the unknown sharp image x^^^ without knowledge of the 
blur kernel k^^^. In the simplest model of blur, y is formed by 
convolving x^^^ with k^^^ and adding noise n: 



y = e 



(1) 



This convolution model assumes spatially uniform blur, 
which is frequently violated due to slight camera rotations 
and out-of-plane effects fP] . Still, the uniform model works 
surprisingly well and methods for it can be extended to handle 
nonuniform blur lEHS). 

Even with uniform blur by a single kernel, the blind 
deconvolution problem is highly underdetermined and addi- 
tional assumptions must be made to obtain a solution. These 
assumptions are often imposed most conveniently by moving 
the problem into a filter space. We define filters {/^jf^^^ and 



set y^ = f^^y and x^^^ 



, so that 



^true ^ ^tme 



(2) 



for 7 G [L] = {1, . . . , L}. Defining x^^ 
y = {yj}i;=i, and {k * x^^^)^ = /c * x'^^. 
filter space problem compactly as 



,true 1 L 

we can write the 



L. (3) 

The simplest nontrivial filter space is gradient space, where 
L — 2 and /i = [1, — 1], /2 = [1, —1]^, but there are many 



other possibilities. Determining x from a filter space repre- 
sentation X often does not work well, so typically one obtains 
an estimate k of /c^^^ and deconvolves y with k to get x HI . 

Bayesian inference is a convenient framework for impos- 
ing prior assumptions to regularize blind deconvolution [51 . 
By assuming some distribution of n we obtain a likelihood 
function p(y | /c * x) which gives the probability that the data 
y arose from a given pair (/c, x). We then choose priors p{k) 
and p(x) and compute the posterior distribution 



p{k, X I y) (X p(y I /c * x.)p{x.)p{k) . 



(4) 



Estimates of x and k may be obtained by summary statis- 
tics on p{k^ X I y). We call the mode of p{k^ x | y) the joint 
maximum a posteriori (MAP) estimator, while the mode of 
the marginal p{k | y) = / p{k^ x | y)(ix is the kernel MAP 
estimator. Most blind deconvolution methods are nominally 
MAP estimators but do not actually find a global minimizer, 
as this is typically intractable and may even be counterproduc- 
tive. We refer to any method organized around optimizing a 
posterior as a MAP method, while methods that actually find 
a global minimum will be called ideal MAP methods. Joint 
MAP methods typically attempt to minimize the cost function 
x) = — logp(/c, X I y), which may be written (up to an 
irrelevant additive constant) as the sum of a data misfit and 
two regularization terms. 



F{k, x) = L{k * x) + i?x(x) + Rk{k), 



(5) 



where each of these functions may take the value +oo to rep- 
resent a hard constraint. Kernel MAP estimation is more dif- 
ficult as it involves a high-dimensional marginalization, and 
it is typically approximated by variational Bayes or MCMC 
sampling |5|. 

Joint MAP estimation is the oldest, simplest, and most 
versatile approach to blind deconvolution [6-8 1, but initial 
joint MAP efforts on the camera shake problem met with 
failure [|9l, even when £p regularizers for p < 1 were used. 
In [[11, Levin et al showed that the £p regularizer generally 
prefers blurry images to sharp ones: ||y ||^ < ||x^^^ ||^, so that 
ideal joint MAP typically gives the trivial no-blur solution 
{k^x) = {Sq^ y), where Sq is the Kronecker delta kernel. The 
non-ideal joint MAP methods [ lO^JJJ somewhat compensate 
for the defects of ideal joint MAP by dynamic edge prediction 
and likelihood weighting, but benchmarking in [111 121 showed 
that these heuristics sometimes fail. 



In ^ Fergus et al developed a kernel MAP method with 
a sparse edge prior which was very effective for correcting 
camera shake. In |1| it was noted that marginalization over 
X seems to immunize ideal kernel MAP against the blur- 
favoring prior problem. More refined kernel MAP methods 
were recently reported in [12] and [13], and to our knowledge 
these two methods are the top performers on the benchmark 
32 image test set from fTl. While these efforts have made 
kernel MAP much more tractable, it remains harder to under- 
stand and generalize than joint MAP, so it would be useful 
to find a joint MAP method that is competitive with kernel 
MAP on the camera shake problem. 

In |[T4ll . Krishnan et al addressed the blur-favoring prior 
problem in joint MAP by changing the prior, proposing the 
scale-invariant ^1/^2 ratio as a 'normalized' sparse edge 
penalty. The £2 normalization compensates for the way that 
blur reduces total ^l edge mass, causing the ^1/^2 penalty to 
prefer sharp images and eliminating the need for additional 
heuristics. While their algorithm does not quite match the 
performance of [9| on the benchmark test set from [T|, it 
comes fairly close while being significantly simpler, faster, 
and in some cases more robust. Other promising joint MAP 
methods include 1 15 - 17], but we are not aware of public code 
with full benchmark results for these methods. 

1.1. Our approach 

We propose a new approach to joint MAP blind deconvolution 
in which the kernel is estimated from a sparse approximation 
X of the sharp gradient map x^^^. Initially we constrain x to 
be very sparse, so it contains only the few strongest edges in 
the image, and we determine k such that /c*x ^ y. Because x 
is so much sparser than y , the solution /c = is very unlikely. 
But generally this initial k overestimates /c^^^^, so we refine k 
by letting weaker edges into x. 

To present this approach formally, we set /i = [1 , — 1] , /2 = 
[1,-1]^, so that x(p) = {xi{p)^X2{p)) is the discrete 
image gradient vector at each pixel p. We set L(/c,x) = 
\ * X — y||2 and impose the usual positivity and unit sum 
constraints on k. We measure gradient sparsity using the ^2,0 
norm: |x(p)| is the ^2 length of x(p) and ||x||2 q = II 1^1 II 
the number of nonzero gradient vectors. The joint MAP 
optimization problem is then 

minimize |||/c*x — y||2 

subject to /c > 0, = 1, ||x||2o^'?"7 

where the expression oFh denotes the dot product of the ar- 
rays a and h when considered as vectors, and the 1 in is 
an all-ones array. 

We solve this problem with an iterative optimizer de- 
scribed in ^ and slowly increase r as the iterations proceed. 
To initialize r we use the ^1/^2 ratio, a robust lower bound 



on a signal's sparsity ifTSl . The sharp x should be signifi- 
cantly sparser than y, so initially we set r = /^oTy, where 

— llly|lli / Illy|ll2 /3o < 1 is a small constant. After 
an initial bum-in period of iterations we multiply r by a 
constant growth factor 7 > 1, an action we repeat every Is 
iterations thereafter. 

We use a standard multiscale seeding technique to acceler- 
ate the kernel estimation step |9, 14J. We begin by solving © 
with a heavily downsampled y , giving a cheap, low-resolution 
approximation to k and x. We then upsample this approxima- 
tion and use it as an initial guess to solve © with a higher 
resolution y, repeating the upsample-and-seed cycle until we 
reach the full resolution y. At each scale we use the same r 
increase schedule. After kernel estimation we use non-blind 
deconvolution of y with k to get the sharp image x. 

The easiest way to understand how our kernel estimation 
works is to watch k and x evolve as the iterations progress. 
In Fig. [11 the state of k and x is shown at iterations 2, 32, 
and 150 of the final full-resolution scale, with /c^^^ and x^^^ 
at far right. Initially x is quite sparse, so k cannot be a trivial 
kernel because the parts of y not in x must be attributed to 
blur. But this initial approximation is crude, so as r increases 
with iteration, x is allowed to have more and more edges so 
that k can be refined. 

1.2. Novelty and relations with existing methods 

Direct ^0 optimization is well-established in the compressed 
sensing community 1 19 ,20 1 but we are not aware of any effec- 
tive ^0 approaches to blind deconvolution. In |[T4ll the ^1/^2 
ratio was deliberately chosen over ^Q because while both have 
the desired scale invariance, the graph of ^1/^2 is smoother 
and looks more 'optimizable' than ^Q. We contend that 
may be difficult to use as a cost function, but very effective as 
a constraint. Gradient and kernel thresholding are commonly 
used lUniliri and these can be interpreted as ^0 projections, 
but they are typically used as auxiliary heuristics, not as the 
central modeling idea. Our technique of slowly increasing 
the sparsity constraint r is reminiscent of matching pursuit 
algorithms for sparse approximation [|2T]|22]. It is also re- 
lated to the likelihood reweighting technique of fTOl, which 
may be seen by considering the Lagrangian of ©. However, 
our initialization strategy requires that we use the constrained 
formulation rather than the Lagrangian. 

2. ALTERNATING PROJECTED GRADIENT 
METHOD 

To solve problem © at a given scale, we use a standard al- 
ternating descent strategy: starting from some initial k and 
X, we reduce the cost function by updating x with k fixed, 
then k with x fixed, cycling until a stopping criterion is met. 
Each cycle, or outer iteration, consists of /x inner iterations 
updating x and Ik inner iterations updating k. All updates are 





Fig. 1. Kernel estimation on an image from the test set of [1 J; a small patch has been selected and rescaled for clarity. Left: 
Blurry edge map |y|. Center left to center right: Evolution of the kernel k (inset) and edge map magnitude |x| in the final 
full-resolution stage. As r increases in (|6]), the edge map becomes less sparse and the kernel is refined. Right: k}^^ and |x^^^ |. 



computed with a projected gradient method; given a smooth 
function h{u) and a constraint set projected gradient meth- 
ods seek a solution of min^^^ h{u) by updates of the form 
u ^ Vu{u — OiuQu)^ where Qu = Vh{u), au is a step size, 
diWdVui"^) = argmin — k;||2 is the Euclidean projec- 

tion of w onto U. Convergence of alternating descent and pro- 
jected gradient methods to stationary points is proven in [|23ll 
under mild conditions. 

We now describe how we compute the projected gradi- 
ent iterations for the inner subproblems miiikeic L{k^ x) and 
minxGA" ^(^, x), where L(/c,x) = ^ * x — y||2, /C = 
{/c I /c > 0, l^k = 1}, and A' = {x | ||x||2 q < r}. Letting 
r = /c*x — y denote the residual, we have V/cL = x^ 
and Vx^ = ^ * r, where the bar denotes 180° rotation about 
the origin. Assuming the nonzero elements of |x| are distinct, 
the projection Vx{'^) is the top-r vector thresholding 



P;,(x)(i)=x(i).l(|x(i)|>e(|x|,r)), 



(7) 



where 1(A) is the indicator function for condition A and 
6>(|x|,r) is the r^^ biggest element of |x|. The set /C is a 
canonical simplex with projection V)c{k) given by 



PK:{k){i) = max(0, k{i) — <j), 



(8) 



where a is the unique solution of l^P;c(^) = 1- Both Px and 
P}C can be computed in linear time using selection algorithms 

The step sizes ax,<^/c are chosen by backtracking line 
search from an initial guess. In the x subproblem our initial 
guess is 



{k^ gy,Y{k^ g^y 



(9) 



which is optimal in the sense that it solves the problem 
minQ, L{k^^ — ag^^). This aggressive step size was chosen 
over several alternatives, as it was the most effective for se- 
curing good edge support estimates. In the k subproblem we 
use the spectral projected gradient (SPG) method [26|; in the 
first iteration a/c = 1, and in subsequent iterations we use the 



Barzilai-Borwein step size 

{9k-grY{gu-gr) 
(gk-g^y^'Vik-koid) 



Oik 



(10) 



where g^^^ and k^^^ denote the values of gk and k at the pre- 
vious SPG iteration. 



3. IMPLEMENTATION AND EXPERIMENTS 

We implemented our method in MATLAB by modifying the 
code of [14J, which uses a similar strategy of alternating min- 
imization with multiscale seeding. The full-resolution kernel 
size was set to 35 x 35 for all experiments. The initial stage of 
the multiscale algorithm downsamples y by a factor of 5/35 
in each direction, so that the kernel is of size 5x5, and each 
upsample cycle increases the size of /c, x, and y by a factor of 
roughly V2 until full resolution is reached. The parameters of 
the core single-scale algorithm from ^were set to /3o = 0.15, 
7 = LIO, h = 20, Is = 10, /x = 1, //c = 6. We do 30 it- 
erations of the alternating projected gradient method for all 
scales except the final, full-resolution scale, which uses 1 80 
iterations. Non-blind deconvolution with the estimated kernel 
was performed using the method of [27 1, using the parameter 
settings chosen in the code for llT2l . 

In [IJ a test set of 32 blurry images with known ground 
truth was created for benchmarking blind deconvolution 
methods. Each blurry image was formed by taking a pic- 
ture of a sharp image with a camera that shook in-plane, and 
bright points outside the image were used to obtain ground 
truth blur kernels. A total of 32 blurry images were formed 
by blurring 4 sharp images on 8 different shake trajectories. 
This test set has become the de facto standard for objectively 
comparing different methods. 

We ran our algorithm on this test set and compared its 
performance against the methods of [12] and [13|. We com- 
pare against these methods because they have published im- 
plementations which match or exceed the performance of the 
state-of-the-art methods in ll9l- [TnfT4ll . and we know of no 
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Fig. 2. Sample results from our method, ||I2][T3] on the benchmark set of Q. True and recovered kernels inset. 
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Fig. 3. Cumulative deblurring performance our method, [T2l, 
and fm on the 32 image test set of [1 J. The vertical axis is 
the percentage of the 32 runs having at most a given SSE. 



methods that outperform [[T2l and [[T3l on this test. We use 
the squared error metric SSE(x) = — x^^^(z))^ to 

measure performance and note that results using the ratio met- 
ric of im are similar. Results for [|T2l were taken from files 
included with their published implementation, while results 
for 1 13 1 were generated by running their online test script us- 
ing the log prior, which was the best in their experiments. 

Our experiments were performed in MATLAB 2011b on 
an Intel Quad Core Xeon 2.2 GHz Mac Pro. Our method's 
kernel estimation step took 45 — 60 seconds per image, and 
deconvolution took 15 seconds. The other methods took 45 — 
240 seconds for kernel estimation, and their computation time 
depended strongly on kernel size. The difference is mostly 
due to our use of cheap SPG iteration rather than quadratic 
programming in the k step, and also because V)c and Vx 
make k and x sparse, enabling /c * x to be computed faster. 

Sample results on the benchmark are shown in Fig.|2] Our 
method and [121 perform very similarly on both images. On 
the house image |[T3ll is very sharp and by far the best, but 
it suffers from severe artifacts on the boy image. Fig. [3] sum- 
marizes the full-benchmark performance of the three methods 



using cumulative error histograms. The curves for our method 
and |[T2l are largely similar. The curve for [T3l is above ours 
and |[T2il for about half of the images, but it flattens out be- 
low 85% while the others plateau at 100%. This is because 
method 1 13 | struggled on the boy images. We note that the 
results reported in [13] for this test set are better than those 
obtained in our run of their code, although we ran it without 
any modification. The authors of |13 | state that their code 
is a simplification of what was used to generate the reported 
results. While there may be a more sophisticated version of 
their code that outperforms ours, our method competes with 
the available state of the art. 

4. CONCLUSION 

We have proposed a blind deconvolution method in which the 
blur kernel is estimated by incremental sparse edge approx- 
imation. A rough global blur kernel is first estimated from 
only the strongest edges in the image, then it is refined as we 
allow the image edge map to gradually become less and less 
sparse. Ours is the first simple, fast joint MAP method to 
match the state-of-the-art kernel MAP methods in |[l2l[T3l on 
an objective benchmark. The success of the methods in lfT4ll 
and this paper suggest that the downsides of ideal joint MAP 
described in |[T1 can be robustly avoided without resort to a 
more complex kernel MAP estimation. 

There are many potential avenues for improving and ex- 
tending our method. The edge sparsity relaxation schedule we 
use is slow and conservative, and a more adaptive schedule 
could make the method faster. Our initialization of the edge 
map sparsity does not take noise into account, and may need 
to be modified for very noisy images. Extension to nonuni- 
form blur models, nonquadratic likelihoods, and fast parallel 
or GPU implementations are possible. The speed of our ker- 
nel estimation may make it useful as an input to high-quality 
non-blind methods such as LI 3.1 . 
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